[QNN EP] Skip inputs/outputs shape validation for QNN Batch Multiple #26336

chunghow-qti · 2025-10-17T05:52:49Z

Description

QNN EP supports batch multiplier during inference, which allows the compile batch size is different to running batch size.
However, InferenceSession::ValidateInputs & InferenceSession::ValidateOutputs validates the input_output_shape against the expected_shape.
We avoid these validations if qnn batch multiplier options is used.
p.s. QNN EP does not support dynamic shapes. We cannot convert ONNX graphs with dynamic shapes into static shapes inside QNN EP. There is no internal mechanism to infer or materialize static shapes from dynamic ones.
A separate PR will be submitted for the implementation of batch multiplier support in QNN EP.

Motivation and Context

This change supports batch multiplier in QNN API to ORT as described in this page: https://docs.qualcomm.com/bundle/publicresource/topics/80-63442-10/function_QnnGraph_8h_1a3ea05f42a9295f9a74a2e3a0cdd64228.html

@microsoft-github-policy-service agree company="Qualcomm"

Description * QNN EP supports batch multiplier during inference, while InferenceSession::checkShapes validates the input_output_shape against the expected_shape. * This check is relaxed when all nodes are assigned to QNN EP and running batch size is divisible by the original batch size. * A separate PR will be submitted for the implementation of batch multiplier support in QNN EP. Motivation and Context * This change supports batch multiplier in QNN API to ORT as described in this page: https://docs.qualcomm.com/bundle/publicresource/topics/80-63442-10/function_QnnGraph_8h_1a3ea05f42a9295f9a74a2e3a0cdd64228.html

chunghow-qti · 2025-10-17T05:54:15Z

@microsoft-github-policy-service agree company="Qualcomm"

edgchen1 · 2025-10-23T17:17:13Z

onnxruntime/core/session/inference_session.cc

+    } else if (i == 0 && is_qnn_batch_multiplier_valid(input_output_shape[i], expected_shape[i], model_->MainGraph())) {
+      continue;  // Qnn API supports batch multiplier, but the running batch size must be divisible by the original batch size.


I don't think we want to add QNN EP-specific relaxing of shape validation here. if the graph was not running on the QNN EP, would it be considered invalid? can you elaborate on what you are trying to do?

Hi, thanks for the question.

The shape only be considered valid under QNN EP due to its support for batch multiplier.
If the graph is not assigned to QNN EP, the original shape check logic applies, and the shape would be considered invalid.

QNN EP supports batch multiplier, which allow the model to be compiled with smaller batch size (e.g. 2), and then run inference with a larger batch size (e.g. 128), as long as the inference batch size is divisible by the compile batch size.

This can help reduce compile time (session creation time), while still supporting larger inference batches.
What we're trying to do is ensure that this flexibility is preserved during shape validation when the graph is intended to run on QNN EP.

The shape only be considered valid under QNN EP due to its support for batch multiplier.
If the graph is not assigned to QNN EP, the original shape check logic applies, and the shape would be considered invalid.

thanks for clarifying.

I still think that we don't want to add QNN EP-specific handling here. looking ahead, consider that for plugin EPs, it would be even less desirable to have similar hardcoded logic at this point.

QNN EP supports batch multiplier, which allow the model to be compiled with smaller batch size (e.g. 2), and then run inference with a larger batch size (e.g. 128), as long as the inference batch size is divisible by the compile batch size.

is it possible for the QNN EP to manage this optimization internally? e.g., identify that a smaller batch size than the one in the actual shape can be used for compilation.

Hi @edgechen1, here is more detailed explanation about this PR.

When using onnxruntime_perf_test.exe, onnx_test_runner.exe, or any application that calls InferenceSession, it will use InferenceSession::Initiatization() and InferenceSession::Run().

Current Flow

InferenceSession::Initiatization()

Uses the source onnx model's input shape as the expected dimension. If a dimension in the input shape is dynamic, it is labeled as -1. Otherwise, it is a positive integer.

InferenceSession::Run()

Checks the input data shape.

If a dimension in the input shape is dynamic, it ignores the check.

Otherwise, the dimension must match.

After checking, it calls the QNN function to execute the graph.

Our Usecase

InferenceSession::Initiatization()

The source onnx model's input shape is used, and we assign the batch dimension with a positive integer as base batch (ex: 2)

InferenceSession::Run()

Checks the input data shape.

If a dimension in the input shape is dynamic, it ignores the check. → Won't happen in our usecase

Otherwise, the dimension must match. → This is what we don't want, because QNN supports "batch multiplier", which allows the running input batch to be divisible by the assigned batch number. (ex: 4, 6, 8)
After checking, it calls the QNN function to execute the graph.

Reason of Change

As in our Usecase 2.1.2, this check prevents us from enabling the batch multiplier, which would allow us to improve context preparation efficiency. For example, we could compile the model with batch=2 and support different input batch sizes, rather than being limited to a fixed batch size.

If we do not relax this check, there is no alternative to adjust it on the QNN side since this check occurs at the very first phase of inference.

Otherwise, the dimension must match. → This is what we don't want, because QNN supports "batch multiplier", which allows the running input batch to be divisible by the assigned batch number. (ex: 4, 6, 8)

I'm not convinced that what you're proposing is the behavior that we (ORT) would want. My understanding is that the input dimension at runtime should match what is specified in the static shape. If one wants to set different values for an input dimension, then it should be a dynamic dimension.

Could the QNN EP support this by allowing the "batch multiplier" dimension to be dynamic?

Could the QNN EP support this by allowing the "batch multiplier" dimension to be dynamic?

For QNN-EP, dynamic dimension support has not been implemented yet.
https://github.com/microsoft/onnxruntime/blob/615c22bf6ba5d259c49484416d0a87a91d936e13/onnxruntime/…

The "batch multiplier" helps save preparation time by compiling the smallest batch, allowing different batch sizes to run without recompiling. However, even we enable dynamic dimensions in QNN-EP, it would not help reduce preparation time in this case.

Our Usecase

InferenceSession::Initiatization()

The source onnx model's input shape is used, and we assign the batch dimension with a positive integer as base batch (ex: 2)

InferenceSession::Run()

Checks the input data shape.

If a dimension in the input shape is dynamic, it ignores the check. → Won't happen in our usecase

Otherwise, the dimension must match. → This is what we don't want, because QNN supports "batch multiplier", which allows the running input batch to be divisible by the assigned batch number. (ex: 4, 6, 8)
After checking, it calls the QNN function to execute the graph.

A shape can be dynamic or fixed. If it's fixed, the data must match. Creating a new "dimension must be a multiple of this value" state would invalidate a lot of assumptions in the code and it is not part of the ONNX spec.

Having an EP specific "trust us it's okay" piece of code in ORT core doesn't really work either. What happens if another EP wants a slightly different new state? Do we accumulate a bunch of special cases that potentially conflict with each other?

Can you set some other metadata in the model to indicate the smallest batch size? e.g. put entries in the model metadata like "batch_multiplier":"4" and "batch_multiplier_dim_name":"N" if the symbolic dimension name for the batch size is 'N'. that lets you specify the value and the dim_name it applies to.

Hi @skottmckay @edgchen1 ,

In the batch multiplier scenario (where the compile-time batch size may differ from the runtime batch size), it appears that only ONNX models with dynamic batch size can pass the checkShape validation. However, since QNN EP does not support dynamic shapes, we require a static ONNX model inside the EP.
Would it be feasible to apply a free dimension override to ONNX Graph within QNN EP to convert the dynamic shape of the ONNX Graph into a static shape, and then perform shape inference to ensure that the output shape is also fully static?

Given that GraphViewer or Graph cannot be constructed inside QNN EP, we cannot replicate the ONNX graph internally within the EP.

* Remove dedundant code, and use existing function * Use macro wrap around these codes to ensure all when building QNN would trigger the extra check

chunghow-qti · 2025-10-29T01:39:58Z

Hi @yuslepukhin , this PR will impact our developing feature, it would be great if you can help review and share the thought on this PR, thanks!

onnxruntime/core/session/inference_session.cc

* add SessionOptions for qnn htp batch multiplier * avoid validating inputs/outpus if this option is used

qti-chuteng · 2025-11-20T01:46:32Z

Hi @edgchen1, @yuslepukhin @skottmckay,
Thank you for reviewing. We've updated this PR with a new solution that reduces the specific logic in the common code. Could you please take another look?

edgchen1 · 2025-11-21T02:23:23Z

onnxruntime/core/session/inference_session.cc

-
+#ifdef USE_QNN
+      const bool batch_multiplier = session_options_.config_options.GetConfigOrDefault(kOrtSessionOptionsQnnHtpBatchMultiplier, "0") == "1";
+      if (!batch_multiplier) {


disabling input/output validation entirely seems a bit drastic. while it would allow your use case, it would also let other invalid cases through and potentially lead to runtime errors that are harder to debug.

Agree with @edgchen1. We should not disable the input/output validation entirely.

@qti-chuteng,
We should still validate the input and output shapes with batch_dim at run is integral multiplier of original dimension while other dimensions should match the original shape as shown in below example:

original shape: [2, 224, 224, 3]
input shape at run inference: [10, 224, 224, 3]
where, batch(run_input_shape) % batch(orig_input_shape) == 0 and other dim values should match between both orig_input_shape and run_input_shape.

edgchen1 reviewed Oct 23, 2025

View reviewed changes

edgchen1 requested a review from yuslepukhin October 23, 2025 17:17

[QNN EP] Relax checkShape for Batch Multiplier

e9da0b1

* Remove dedundant code, and use existing function * Use macro wrap around these codes to ensure all when building QNN would trigger the extra check

edgchen1 requested review from adrianlizarraga and skottmckay November 5, 2025 03:48

skottmckay reviewed Nov 5, 2025

View reviewed changes

onnxruntime/core/session/inference_session.cc Outdated Show resolved Hide resolved

chunghow-qti force-pushed the dev/chunghow-qti/relax-shape-check branch 4 times, most recently from 1260d3c to 2e429e1 Compare November 19, 2025 04:04

chunghow-qti changed the title ~~[QNN EP] Relax checkShape for Batch Multiplier~~ [QNN EP] Relax shape validation for QNN HTP Batch Multiplier Nov 19, 2025

chunghow-qti changed the title ~~[QNN EP] Relax shape validation for QNN HTP Batch Multiplier~~ [QNN EP] Skip inputs/outputs shape validation for QNN Batch Multiple Nov 19, 2025

Relax Inputs/Ouputs validation

7ccc250

* add SessionOptions for qnn htp batch multiplier * avoid validating inputs/outpus if this option is used

chunghow-qti force-pushed the dev/chunghow-qti/relax-shape-check branch from 2e429e1 to 7ccc250 Compare November 19, 2025 05:42

chunghow-qti mentioned this pull request Nov 20, 2025

[QNN EP] Support batch multiple on HTP backend #26619

Open

edgchen1 reviewed Nov 21, 2025

View reviewed changes

		} else if (i == 0 && is_qnn_batch_multiplier_valid(input_output_shape[i], expected_shape[i], model_->MainGraph())) {
		continue; // Qnn API supports batch multiplier, but the running batch size must be divisible by the original batch size.

[QNN EP] Skip inputs/outputs shape validation for QNN Batch Multiple #26336

Are you sure you want to change the base?

[QNN EP] Skip inputs/outputs shape validation for QNN Batch Multiple #26336

Uh oh!

Conversation

chunghow-qti commented Oct 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Motivation and Context

Uh oh!

chunghow-qti commented Oct 17, 2025

Uh oh!

edgchen1 Oct 23, 2025

Choose a reason for hiding this comment

Uh oh!

chunghow-qti Oct 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

edgchen1 Oct 31, 2025

Choose a reason for hiding this comment

Uh oh!

qti-chuteng Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

edgchen1 Nov 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

qti-chuteng Nov 5, 2025

Choose a reason for hiding this comment

Uh oh!

skottmckay Nov 7, 2025

Choose a reason for hiding this comment

Uh oh!

chunghow-qti Nov 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chunghow-qti commented Oct 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

qti-chuteng commented Nov 20, 2025

Uh oh!

edgchen1 Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

quic-tirupath Nov 28, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

chunghow-qti commented Oct 17, 2025 •

edited

Loading

chunghow-qti Oct 27, 2025 •

edited

Loading

edgchen1 Nov 5, 2025 •

edited

Loading

chunghow-qti Nov 14, 2025 •

edited

Loading

chunghow-qti commented Oct 29, 2025 •

edited

Loading